PS2010 Workshops Code Book


Workshop 1: Describing Data


Exercise One

Create your .csv file. You need to do this in excel. You should use the data below.

id water redcow
1 77 68
2 79 53
3 85 50
4 96 63
5 62 82

Your data should be in long format and have three columns: id, drink, and sleep.
It should look like the image below:

Make sure you save it as a .csv. file.

Next we need to call any packages!

install.packages("tidyverse") #installs the tidyverse package if you don't have it already
library(tidyverse) #calls the tidyverse package.

Import your newly created data file into RStudio using read_csv().
Make sure your working directory is set-up correctly.
If this works, you should see mydata appear in the environment (top right panel).

mydata <- read_csv("redcow.csv")

Exercise Two

Today we will just quickly check our data.

head(mydata) # shows the first few rows of your data.
names(mydata) # shows the variable names.
summary(mydata) # provides summary statistics of a data set.

Exercise Three

descriptives_bygroup <- mydata %>% #Which data set to use
  group_by(drink) %>% # splits file by "drink"
  summarise(mean_bpm = mean(sleep), sd_bpm = sd(sleep)) # ask for the mean and standard deviation

print(descriptives_bygroup)

Exercise Four (Optional)

descriptives_all <- mydata %>% #Which data set to use
  summarise(mean_bpm = mean(sleep), sd_bpm = sd(sleep)) # ask for the mean and standard deviation

print(descriptives_all)

END OF WORKSHOP


Workshop 2: T-tests (Independent)


Exercise One

We need to call any packages!

install.packages("rstatix")
library(tidyverse)
library(rstatix)

mydata <- read_csv("redcow2.csv")

Exercise Two

names(mydata) 
summary(mydata)

Exercise Three

# we want to produce some basic descriptive statistics for our data set.
# we know that we want to compare redcow with water, so use the same code from last week to produce mean values.
descriptives_bygroup <- mydata %>%
  group_by(drink) %>%
  summarise(m = mean(sleep), sd = sd(sleep))

# now we will print our descriptive stats in something called a tibble
print(descriptives_bygroup) # print to console
view(descriptives_bygroup) # open in new tab, it is best to use this

Exercise Four

Run an independent t-test comparing beats per minute (bpm) for the redcow vs. water groups.

# we want to run a t-test to compare the bpm (dependent variable) across the two drink groups (independent variable: redcow and water)
redcow_t <- t.test(sleep ~ drink, mydata, var.equal = FALSE)

print(redcow_t)

# we also want to find and report the effect size. for a t-test we can use Cohen's d.
mydata %>% 
  cohens_d(sleep ~ drink, var.equal = FALSE)

Exercise Five

# we want to check any necessary assumptions that might change how we interpret our model
# we do not need to look at homogeneity of variance when using Welch's t-test.
# we only need to look at normality.
hist(mydata$sleep[mydata$drink == "redcow"]) #creates histogram for redcow
hist(mydata$sleep[mydata$drink == "water"]) #creates histogram for water

shapiro.test(mydata$sleep[mydata$drink == "redcow"]) #runs Shapiro test for redcow
shapiro.test(mydata$sleep[mydata$drink == "water"]) #runs Shapiro test for water

Exercise Six (Optional)

You will practice graphing your findings from next week. But for those of you who want a head start, feel free to run the code below to visually present your descriptive statistics. Copy and paste the code below.

# use the code to create a box plot of your descriptive statistics.
ggplot(mydata, aes(x = drink, y = sleep)) +
  geom_boxplot() +
  labs(title = "Sleep Quality Score for RedCow vs Water",
       x = "Drink",
       y = "Mean Sleep Quality Score") +
  theme_classic()

END OF WORKSHOP


Workshop 3: T-tests (Repeated and One Sample)


Exercise One

We need to call any packages!

install.packages("rstatix")
library(tidyverse)
library(rstatix)

mydata <- read_csv("redcow3.csv")

Exercise Two

head(mydata)
summary(mydata) 

It looks as though the data are not in long format.
We need to create a new data set with the data in long format.
Thankfully we can do this with some simple code:

mydata_long <- mydata %>%               # creates a new object called mydata_long
  pivot_longer(cols = c(before, after), # use pivot_longer() and select the column names.
               names_to = "time",       # give a column name for our independent variable
               values_to = "bpm")       # give a column name for our dependent variable

Check that mydata_long has appeared in the environment (top right panel)


Exercise Three

Adapt the code below to produce descriptive statistics.
You need to use the variable names from your long data file: time and bpm.
Change NULL to the relevant details.

descriptives <- NULL %>%                       # Which data set will you use?
  group_by(NULL) %>%                          # what should you split the file by?
  summarise(mean = mean(NULL), sd = sd(NULL)) # mean and standard deviation of the dependent variable

view(NULL)

We can also look at a box plot too. Use the code below:

ggplot(mydata_long, aes(x = time, y = bpm, fill = time)) +
  geom_boxplot() +
  theme_classic()

Exercise Four

Run a repeated t-test comparing BPM before and after drinking RedCow.

# we want to run a t-test to compare the bpm (dependent variable) across time (levels: before vs. after)
before <- mydata_long$bpm[mydata_long$time == "before"] # tell R which data is "before"
after <- mydata_long$bpm[mydata_long$time == "after"]   # tell R which data is "after"

redcow_t <- t.test(before, after, paired = TRUE)

print(redcow_t)

# we also want to find and report the effect size. for a t-test we can use Cohen's d.
cohens_d(before, after, paired = TRUE)

Exercise Five

Use the lines of code below to check the assumption of normality.
Remember to check back on the lecture slides if you need.
Firstly, we need to calculate a difference score.

# we need to check normality, but for the difference scores (see the lecture if this doesn't make sense)
# first we need to calculate the difference score
# we will go back to the original data file "mydata" for this, not "mydata_long"
diff_score = mydata$before - mydata$after 

hist(diff_score) #creates histogram for the diff_score
shapiro.test(diff_score) #runs Shapiro test for diff_score

Exercise Six

Re-run the analysis presented in lecture.
However, have you noticed how chocolate bars seem smaller and smaller these days?
Well, here at RedCow we also want to introduce smaller bars Run the analysis using redcowchoc.csv, but this time the reference value is 30 grams.
Remember to amend the code below to replace NULL.

# import data
mydata2 <- read_csv("redcowchoc.csv")

# run one sample t-test against the reference value of 30g
t.test(mydata2$weight, mu = NULL)

Exercise Seven (Optional)

# 7.1 advancing data visualisation: boxplot

#reusing the code for the boxplot in exercise 3, let's tidy up the boxplot by adding some labels.
ggplot(mydata_long, aes(x = time, y = bpm)) +
  geom_boxplot() +
  labs(title = "Resting Heart Rate After Consumption of RedCow",
       x = "Time",
       y = "Median Beats per Minute (BPM)") +
  theme_classic()

## 7.2 advancing data visualisation: bar chart

#this is example code for a bar chart to look at for now, you'll learn more about how it works in future classes!
ggplot(mydata_long, aes(x = time, y = bpm)) +
  geom_bar(stat = "summary", fun = "mean", fill = "lightgrey", width = 0.7) +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
  labs(title = "Resting Heart Rate After Consumption of RedCow",
       x = "Time Point",
       y = "Mean Beats per Minute (BPM)") +
  theme_classic()

END OF WORKSHOP


Workshop 4: One-Way ANOVA (Independent)


Exercise One

We need to call any packages!

# only imstall packages if you don't have them installed already
install.packages("emmeans")
insyall.packages("afex")
install.packages("car")

# otherwise just call the packages
library(tidyverse)
library(afex)
library(rstatix)
library(emmeans)
library(broom)
library(car)

mydata <- read_csv("earlybird.csv")

Exercise Two

# have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)

We need to code our variables.
Any categorical variable needs to be set-up as a factor (as.factor) and any scale/numerical variables need to be set-up as a numeric value (as.numeric).
This is an important step because it could cause issues later if missed.

# now you are working with more complex data sets, we need to make sure variables are set-up correctly. This is quite an important step.
# we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata$group <- factor(mydata$group)      #turns group into a factor
mydata$alert <- as.numeric(mydata$alert)  #turns this into numeric

Exercise Three

Adapt the code below to produce descriptive statistics (replace NULL).

descriptives <- NULL %>%                       # Which data set will you use?
  group_by(NULL) %>%                          # what should you split the file by?
  summarise(mean = mean(NULL), sd = sd(NULL)) # mean and standard deviation of the dependent variable

view(NULL)

We can also look at a box plot of our data. Use the code below to generate a box plot:

#how about a box plot too

ggplot(mydata, aes(x = group, y = alert, fill = group)) +
  geom_boxplot() +
  theme_classic()

Exercise Four

Run a one-way independent ANOVA comparing the three groups (lark, neither, owl) as the independent variable and the alertness score as dependent variable.
For this exercise you should run a post-hoc test.

#run the ANOVA model
anova_result <- aov_ez(id = "id",             # identifier variable
                       dv = "alert",          # dependent variable variable name
                       between = "group",     # between subjects variable name
                       type = 3,              # leave this as `3`
                       include_aov = TRUE,    # leave this as `TRUE`
                       data = mydata)         # name of your data file

# just some code to help tidy the output
anova_result_output <- (anova_result$anova_table) %>%
  tidy()
view(anova_result_output)

# we need eta squared for the effect size
install.packages("effectsize")
library(effectsize)
eta_squared(anova_result, partial = FALSE) # asks for eta square

Post-hoc Options

First we need to run emmeans()

# we need emmeans to run post-hocs
em_means <- emmeans(anova_result, ~ group)  # group is the between subjects variable in our data

Then set-up the pairwise comparisons and decide which alpha level adjustment to use.
Pick one from either:
1. Holm
2. Bonferroni
3. Tukey HSD

pairwise_contrasts <- contrast(em_means, method = "pairwise") # set up pairwise comparisons

print(pairwise_contrasts, adjust = "holm")                    # applies holm adjustment
print(pairwise_contrasts, adjust = "bonferroni")              # applies bonf adjustment
print(pairwise_contrasts, adjust = "tukey")                   # applies tukey adjustment

For future reference, here is the Games-Howell Option

# Games-Howell is an option for post-hocs when you have unequal variances
games_howell <- mydata %>%
  games_howell_test(alert ~ group)  # group is the between subjects variable in our data

print(games_howell)

Exercise Five

Ordinarily we wouldn’t run multiple analyses on the same data set (this is considered dodgy and is known as p-hacking).
But, for education purposes I’m going to ask you to break the rules and re-analyse the same data, this time using a planned contrast.

We will use the same ANOVA model from the previous exercise: anova_result The model is the same, we are just going to use a planned contrast instead of post-hoc.

# we need emmeans to run planned contrasts
em_means <- emmeans(anova_result, ~ group)  # group is the between subjects variable in our data

Let’s view the levels of your data and the order they’re in This is important as we have to make sure the order matches our contrasts.

levels(mydata$group) # tells us the order to set up any specified contrasts.
                     # this order matters as it must match your contrast codes

Simple Contrast

simple_contrast <- contrast(em_means, method = list(
  "lark vs neither" = c(-1, 1, 0),      # compares the first level to the second level
  "lark vs owl" = c(-1, 0, 1)))         # compares the first level to the third level

print(simple_contrast, adjust = "holm")

Repeated Contrast

repeated_contrast <- contrast(em_means, method = list(
  "lark vs neither" = c(-1, 1, 0),      # compares the first level to the second level
  "neither vs owl" = c(0, -1, 1)))      # compares the second level to the third level

print(repeated_contrast, adjust = "holm")

Helmert Contrast

helmert_contrast <- repeated_contrast <- contrast(em_means, method = list(
  "lark vs overall effect later levels" = c(-1, 0.5, 0.5), # first vs. overall effect later levels
  "neither vs overall effect later levels" = c(0, -1, 1))) # second vs. overall effect later levels

print(helmert_contrast, adjust = "holm")

Further explanation for coding contrasts

  • Use zero (0) for any level excluded from a contrast.
  • Use negative values for baselin.
  • Use positive values for “treatment” group.
  • The sum of values in every contrast needs to equal zero.

Polynomial Contrast

poly_contrasts <- contrast(em_means, method = "poly") #sets up an runs the polynomial contrast

print(poly_contrasts)

Dunnett Contrast

# Dunnet contrast which will compare a "reference condition" to each of the other conditions.

dunnett_contrasts <- contrast(em_means, method = "trt.vs.ctrl", ref = "lark")
# change "lark" to the name of the reference/control group/condition.

print(dunnett_contrasts)

Exercise Six

Use the code below to evaluate both normality and homogeneity of variance.

#testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model

# produce QQ-Plot
qqPlot(residuals) #produces a qq-plot of residuals

# produce histogram of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") 

# Shapiro test on residuals
shapiro.test(residuals) #Shapiro test for residuals

# testing homogeneity of variance
leveneTest(alert ~ group, mydata)

Exercise Seven (Optional)

Researchers from another lab decided to try and replicate these findings.
Load the data file “earlybird_replication.csv”.
Run the analysis using a one-way independent ANOVA. Remember to ask for a box plot.
You can re-use the code from exercise four.


END OF WORKSHOP


Workshop 5: One-Way ANOVA (Repeated)


Exercise One

We need to call any packages!

library(tidyverse)
library(broom)
library(afex)
library(emmeans)
library(car)

Import the memory.csv data set.


Exercise Two

head(mydata)
names(mydata)
summary(mydata)

The data set does not appear to be in long/tidy form.
We need to change that using pivot_longer. We will create a new version of the data but in long format.

#it looks like the data are not in long format. we need to fix that.
mydata_long <- mydata %>%                             # creates a new object called mydata_long
  pivot_longer(cols = c(year1, year2, year3, year4),  # the existing column names for the IV
               names_to = "time",                     # the name of your independent variable
               values_to = "score")                   # the name of you dependent variable

We need to code our variables.
Any categorical variable needs to be set-up as a factor (factor) and any scale/numerical variables need to be set-up as a numeric value (as.numeric).
This is an important step because it could cause issues later if missed.

# we need to make sure any categorical variables are coded as a "factor" 
# and any scale variables are coded as "numeric"
mydata_long$time = factor(mydata_long$time) #turns time into a factor
mydata_long$score = as.numeric(mydata_long$score) #turns score into numeric

Exercise Three

Adapt the code to produce descriptive stats in the same way you have done previously.
*Hint: Use mydata_long as your data set.

descriptives <- NULL %>%                       # Which data set will you use?
  group_by(NULL) %>%                          # what should you split the file by?
  summarise(mean = mean(NULL), sd = sd(NULL)) # mean and standard deviation of the dependent variable

view(NULL)

Let’s also produce a violin plot to visually present the data.

ggplot(mydata_long, aes(x = time, y = score, fill = time)) +
  geom_violin(trim = FALSE) +  # violin plot without trimming tails
  geom_boxplot(width = 0.1, color = "black", alpha = 0.5) +  # overlay a boxplot for additional summary statistics
  theme_classic() +  # Classic theme
  labs(title = "Violin Plot of Memory Score by Group",  # add a title
       x = "Time Point",  # x-axis label
       y = "Memory Score")  # y-axis label

Exercise Four

Run a one-way repeated ANOVA comparing memory score across the four time points (year1, year2, year3, year4).
Adapt the code below. Change NULL to the relevant variables/information

anova_result <- aov_ez(id = "NULL",
                       dv = "NULL",
                       within = "NULL",
                       type = 3,
                       include_aov = TRUE,
                       data = NULL)

# just some code to help tidy the output
anova_result_output <- (anova_result$anova_table) %>%
  tidy()
view(anova_result_output)

# we need eta squared for the effect size
library(effectsize)
eta_squared(anova_result, partial = FALSE) # asks for eta square

We might also want to print() the ANOVA results. This is because we can see if a Greenhouse-Geisser correction has been applied.
You can check this using two methods:
1. If the degrees of freedom in the ANOVA output are not whole numbers, and have two decimal places then the assumption of Sphericity was not met and the model has automatically adjusted to account for this.
2. You can use the code below to print() the model and it will tell you if any correction was applied.

print(anova_result)  # prints the ANOVA result to the console (bottom left panel)

Once you have looked at the ANOVA output and interpreted, if there is a significant difference you should run planned or post-hoc comparisons.
For today’s example, we want to use a planned contrast.

First we need to pull out the emmeans().
Change NULL to the independent variable in the model: time.

# for any contrasts or post-hocs we need emmeans
em_means <- emmeans(anova_result, ~ NULL)   # NULL should be the independent variable.

Now let’s run the code for a polynomial contrast.

#let's run a polynomial contrast
poly_contrasts <- contrast(em_means, method = "poly") #sets up an runs the polynomial contrast

print(poly_contrasts)

It might help to visualise this effect.
Maybe a line chart could help…

#let's visualise the data to help our interpretation
ggplot(mydata_long, aes(x = time, y = score, fill = time)) +
  stat_summary(fun = mean, geom = "line", color = "black", size = 1, aes(group = 1)) +  # line for mean
  theme_classic() +
  labs(title = "Plot of Memory Score Across Time",  # add a title
       x = "Time Point",  # X-axis label
       y = "Memory Score")  # Y-axis label

Exercise Five

Ordinarily we wouldn’t run multiple analyses on the same data set (this is considered dodgy and is known as p-hacking).
But, for education purposes I’m going to ask you to break the rules and re-analyse the same data, this time adding a covariate.

You have five NULL values to change.
Set the covariate to education which is the number of years of education.
As this is a scale variable (not categorical), set factorize = FALSE.

# run the ANOVA model but this time with a covariate
# it is now an ANCOVA
# there is a `c` in the object name, it says ancova not anova
ancova_result <- aov_ez(id = "NULL",
                       dv = "NULL",
                       within = "NULL",
                       covariate = "NULL",
                       factorize = FALSE,
                       type = 3,
                       include_aov = TRUE,
                       data = NULL)

print(ancova_result)  # with a covariate in the model it is easier to just print

# we need eta squared for the effect size
library(effectsize)
eta_squared(ancova_result, partial = FALSE) #ask for eta square

We can also run polynomial contrasts on this model too.
Remember it is now called ancova_result and not anova_result.
That sneaky little c is in the ANCOVA model.

# for any contrasts or post-hocs we need emmeans
em_means <- emmeans(ancova_result, ~ time) # note ancova_result not anova_result

# let's run a polynomial contrast
poly_contrasts <- contrast(em_means, method = "poly") #sets up an runs the polynomial contrast

print(poly_contrasts)

Exercise Six

Use the code below to evaluate both normality and homogeneity of variance.
This will run it for the anova_result model from exercise four.
You can just add a c into the code to also check for the ANCOVA model.

#testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model

# produce QQ-Plot
qqPlot(residuals) #produces a qq-plot of residuals

# produce histogram of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") 

# Shapiro test on residuals
shapiro.test(residuals) #Shapiro test for residuals

END OF WORKSHOP


Workshop 6: Factorial ANOVA (Independent)


Exercise One

We need to call any packages!

library(tidyverse)
library(afex)
library(emmeans)

Import the memory.csv data set.

Exercise Two

Check the file.

# have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)
view(mydata)

We need to code our variables.

# we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata$notification = factor(mydata$notification) #turns variable into a factor
mydata$usage = factor(mydata$usage, levels = c("rare", "regular", "problematic")) # turns into a factor and orders levels
mydata$anxiety = as.numeric(mydata$anxiety) # turns variable into numeric

Why did we order the usage factor.
R will automatically put variables in alphabetical order and so I wanted to change these so they’re in the order of lowest to highest usage (i.e. rare, regular, then problematic as third level).


Exercise Three

Generate descriptives.
As we have a factorial design, we have two categorical variables to include in the group_by() function. We can also ask for n() so we know how many participants are in each group.

# we have a factorial design so we need to group by BOTH independent variables
# we will also ask for n() to tell us about the group sizes
# we will also start using standard error (se)
descriptives <- mydata %>%
  group_by(notification, usage) %>%
  summarise(m = mean(anxiety), 
            sd = sd(anxiety),
            n = n(),
            se = sd/sqrt(n))

view(descriptives)

A box plot might also be useful.
Also try out ggsave().
You should find the file in your working directory.

#let's also visualise our data with a box plot to see what is going on.
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
  geom_boxplot() +
  theme_classic()

ggsave("box_plot.jpg") # this will save your boxplot into your working directory, as a .jpg file.

You might prefer a violin plot.

# let's also visualise our data with a violin plot to see what is going on.
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
  geom_violin(trim = FALSE, alpha = 0.1) +  # Violin plot without trimming tails
  geom_boxplot(width = 0.1, color = "black", alpha = 0.5, position = position_dodge(.9)) +  #Overlay a boxplot
  theme_minimal() +  # Classic theme
  labs(title = "Violin Plot of Anxiety Scores by Notification and Use",  # Add a title
       x = "Usage",  # X-axis label
       y = "Anxiety Score")  # Y-axis label

ggsave("violin_plot.jpg") # this will save your violin plot into your working directory, as a .jpg file.

Exercise Four

Run a 3x2 factorial ANOVA.

#we can run the ANOVA using aov_ez()
anova_result <- aov_ez(id = "id",
                        between = c("notification", "usage"), 
                        dv = "anxiety", 
                        type = 3,
                        indlude_aov = TRUE,
                        data = mydata)

factorial_output <- summary(anova_result) # this will organise the output ready for viewing

print(factorial_output) # print it to the console

library(effectsize)
eta_squared(anova_result, partial = TRUE) #ask for partial eta square

If the is a significant interaction, we need to break it down. A great start point is to plot it.
Either use your box plot/violin plot.
Or using a line graph is a helpful way to interpret an interaction.

#let's plot it first
interaction.plot(mydata$usage,          # plot the x-axis
                 mydata$notification,   # plot the different lines
                 mydata$anxiety,        # state the dependent variable
                 fun = mean)            # ask for the mean values

Now let’s check out what is driving this interaction.

# first we need to get the emmeans
# then we need to ask for cohen's d too
posthoc_factorial <- emmeans(anova_result, 
                             pairwise ~ notification| usage, 
                             adjust = "holm")
output <-  posthoc_factorial$contrasts %>%
  summary() %>% 
  mutate(cohen_d = effectsize::t_to_d(t.ratio, df))

view(output)

There was a significant main effect of usage.
We could also interpret this, however, the interaction is the most important thing.
If you were to interpret the main effects (where there was no interaction), you can use the exact same code and steps as you used for a one-way ANOVA.

# we need emmeans to run post-hocs
em_means <- emmeans(anova_result, ~ NULLL)  # change NULL to the significant main effect variable...(usage)

Then set-up the pairwise comparisons and decide which alpha level adjustment to use.
Pick one from either:
1. Holm
2. Bonferroni
3. Tukey HSD

pairwise_contrasts <- contrast(em_means, method = "pairwise") # set up pairwise comparisons

print(pairwise_contrasts, adjust = "holm")                    # applies holm adjustment
print(pairwise_contrasts, adjust = "bonferroni")              # applies bonf adjustment
print(pairwise_contrasts, adjust = "tukey")                   # applies tukey adjustment

For future reference, here is the Games-Howell Option.

# Games-Howell is an option for post-hocs when you have unequal variances
games_howell <- mydata %>%
  games_howell_test(alert ~ group)  # group is the between subjects variable in our data

print(games_howell)

Exercise Five

# testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
qqPlot(residuals) #produces a qq-plot of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") #histogram of residuals
shapiro.test(residuals) #Shapiro test for residuals

Arghhhhh A VIOLATION. is this a problem?
Well, Field et al. (2009) notes that an ANOVA is robust to violations of both normality and homogeneity IF…
…sample sizes are equal. We can check this either by looking back at n() with your desscriptives or use the code below to count:

mydata %>% count(usage, notification)

Let’s not forget homogeneity of variance.

# testing homogeneity of variance
library(car)
leveneTest(anxiety ~ usage*notification, mydata)

Optional Code For Different Figures

Click Here
# Optional code: try out the code below to produce different types of plots.
# plot 1
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
  geom_boxplot(outlier.shape = NA, color = "black", width = 0.6, 
               alpha = 0.8, notch = FALSE) +  # Boxplot with black outline and transparency
  theme_classic(base_size = 16) +  # Classic theme with increased base font size
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold", color = "black"),  # Centered, bold title
    axis.title.x = element_text(size = 16, face = "bold", color = "black"),  # Bold and larger x-axis title
    axis.title.y = element_text(size = 16, face = "bold", color = "black"),  # Bold and larger y-axis title
    axis.text.x = element_text(size = 14, color = "black"),  # Larger x-axis text
    axis.text.y = element_text(size = 14, color = "black"),  # Larger y-axis text
    legend.title = element_text(size = 16, face = "bold", color = "black"),  # Bold legend title
    legend.text = element_text(size = 14, color = "black"),  # Larger legend text
    panel.grid.major = element_line(color = "grey85", linetype = "dotted"),  # Light grey dotted major grid lines
    panel.grid.minor = element_blank(),  # Remove minor grid lines
    panel.background = element_rect(fill = "white", color = NA),  # White background for the panel
    plot.background = element_rect(fill = "white", color = NA),  # White background for the entire plot
    legend.position = "right",  # Position the legend to the right
    legend.key = element_rect(fill = "white", color = NA)  # White background for legend keys
  ) +
  scale_fill_brewer(palette = "Set3") +  # Use a colorblind-friendly palette
  labs(
    title = "Box Plot of Anxiety Scores by Usage and Notification Type",  # Clear title
    x = "Usage Type",  # X-axis label
    y = "Anxiety Score",  # Y-axis label
    fill = "Notification Type"  # Legend title
  ) +
  coord_flip()  # Optional: Flip coordinates for a horizontal box plot

#plot 2
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
  geom_boxplot(outlier.shape = NA, color = "black", width = 0.6, alpha = 0.7) +  # Boxplot with no outliers and transparency
  theme_minimal(base_size = 16) +  # Minimal theme with increased base font size
  theme(
    plot.title = element_text(hjust = 0.5, size = 24, face = "bold", color = "black"),  # Centered, bold title
    axis.title.x = element_text(size = 18, face = "bold", color = "black"),  # Bold and larger x-axis title
    axis.title.y = element_text(size = 18, face = "bold", color = "black"),  # Bold and larger y-axis title
    axis.text.x = element_text(size = 14, color = "black"),  # Larger x-axis text
    axis.text.y = element_text(size = 14, color = "black"),  # Larger y-axis text
    legend.title = element_text(size = 16, face = "bold", color = "black"),  # Bold legend title
    legend.text = element_text(size = 14, color = "black"),  # Larger legend text
    panel.grid.major = element_line(color = "grey85", linetype = "dotted"),  # Light grey dotted major grid lines
    panel.grid.minor = element_blank(),  # Remove minor grid lines
    panel.background = element_rect(fill = "white", color = NA),  # White background for the panel
    plot.background = element_rect(fill = "white", color = NA),  # White background for the entire plot
    legend.position = "right",  # Position the legend to the right
    legend.key = element_rect(fill = "white", color = NA)  # White background for legend keys
  ) +
  scale_fill_manual(values = c("#0072B2", "#D55E00", "#009E73")) +  # Custom color palette for distinction and colorblind friendliness
  labs(
    title = "Anxiety Scores by Usage Type and Notification",  # Clear and concise title
    x = "Usage Type",  # X-axis label
    y = "Anxiety Score",  # Y-axis label
    fill = "Notification Type"  # Legend title
  ) +
  coord_flip()  # Optional: Flip coordinates for a horizontal box plot

# plot 3
ggplot(mydata, aes(x = usage, y = anxiety, fill = notification)) +
  geom_violin(trim = FALSE, alpha = 0.2, color = "black", lwd = 0.1) +  # Smooth violin plot with transparency and black outline
  geom_boxplot(width = 0.15, color = "black", alpha = 0.5, position = position_dodge(0.9)) +  # Overlay a slim boxplot for additional statistics
  stat_summary(fun = mean, geom = "point", shape = 20, size = 4, color = "black", position = position_dodge(0.9)) +  # Overlay mean points
  theme_minimal(base_size = 16) +  # Minimal theme with increased base font size
  theme(
    plot.title = element_text(hjust = 0.5, size = 24, face = "bold", color = "black"),  # Centered, bold title
    axis.title.x = element_text(size = 18, face = "bold", color = "black"),  # Bold and larger x-axis title
    axis.title.y = element_text(size = 18, face = "bold", color = "black"),  # Bold and larger y-axis title
    axis.text.x = element_text(size = 14, color = "black"),  # Larger x-axis text
    axis.text.y = element_text(size = 14, color = "black"),  # Larger y-axis text
    legend.title = element_text(size = 16, face = "bold", color = "black"),  # Bold legend title
    legend.text = element_text(size = 14, color = "black"),  # Larger legend text
    panel.grid.major = element_line(color = "grey85", linetype = "dotted"),  # Light grey dotted major grid lines
    panel.grid.minor = element_blank(),  # Remove minor grid lines
    panel.background = element_rect(fill = "white", color = NA),  # White background for the panel
    plot.background = element_rect(fill = "white", color = NA),  # White background for the entire plot
    legend.position = "right",  # Position the legend to the right
    legend.key = element_rect(fill = "white", color = NA)  # White background for legend keys
  ) +
  scale_fill_manual(values = c("#0072B2", "#D55E00", "#009E73")) +  # Custom color palette for distinction and colorblind friendliness
  labs(
    title = "Anxiety Scores by Usage Type and Notification",  # Clear and concise title
    x = "Usage Type",  # X-axis label
    y = "Anxiety Score",  # Y-axis label
    fill = "Notification Type"  # Legend title
  )

END OF WORKSHOP


Workshop 7: Factorial ANOVA (Repeated)


Exercise One

Load relevant packages and import this week’s data file.

library(tidyverse)
library(afex)
library(emmeans)
library(car)

Exercise Two

Check the file.

# have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)
view(mydata)

That file does not look very tidy…

#it looks like the data are not in long tidy format. we need to fix that.
mydata_long <- mydata %>%
  pivot_longer(
    cols = starts_with("before") | starts_with("after"), # columns to pivot, anything that starts with "before" or "after"
    names_to = c("time", "exercise"), # names of new columns
    names_pattern = "(before|after)_(run|swim)", # pattern to separate columns
    values_to = "wellbeing" # Name of the value column
  )

view(mydata_long) # take a look at what the above code has done. nice and tidy!

Recode any variables as necessary.

#we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata_long$time = factor(mydata_long$time, levels = c("before", "after")) #turns time into a factor, changes level.
mydata_long$exercise = factor(mydata_long$exercise) #turns exercise into a factor
mydata_long$wellbeing = as.numeric(mydata_long$wellbeing) #turns wellbeing score into numeric

Exercise Three

Generate descriptives.
As we have a factorial design, we have two categorical variables to include in the group_by() function. We can also ask for n() so we know how many participants are in each group.

# we have a factorial design so we need to group by BOTH independent variables
# we will also ask for n() to tell us about the group sizes
# we will also start using standard error (se)
descriptives <- mydata_long %>%
  group_by(time, exercise) %>%
  summarise(m = mean(wellbeing), 
            sd = sd(wellbeing), 
            n = n(),
            se = sd/sqrt(n))

view(descriptives)

How would you amend the code above to look at the descriptives comparing before vs. after, ignoring exercise type?

A box plot might also be useful.
Also try out ggsave().
You should find the file in your working directory.

#let's also visualise our data with a box plot to see what is going on.
ggplot(mydata_long, aes(x = time, y = wellbeing, fill = exercise)) +
  geom_boxplot() +
  theme_classic()

ggsave("box_plot.jpg")

Let’s try a bar chart…
Don’t worry about understanding all of this code.
You’ll have a bit of practice with ggplot in a future workshop.

#don't let this code scare you. But run it to produce a bar chart!
# you'll learn more about ggplot soon.
bar_chart <- ggplot(mydata_long, aes(x = time, y = wellbeing, fill = exercise, group = interaction(time, exercise))) + 
  stat_summary(
    fun = mean,          
    geom = "bar",          
    position = position_dodge(width = 0.9)
  ) +
  stat_summary(
    fun.data = mean_se,   # Calculate mean and standard error for error bars
    geom = "errorbar",    # Use error bars
    width = 0.2,          
    position = position_dodge(width = 0.9)
  ) + 
  scale_y_continuous(breaks = seq(0, 30, 2)) +  # Set breaks without limiting the scale
  coord_cartesian(ylim = c(0, 22)) +  # Zoom into the y-axis range from 0 to 20
  xlab("Time") + 
  ylab("Mean Wellbeing Score") + 
  scale_fill_brewer(palette = "Dark2") +  
  theme_classic(base_size = 12, base_family = "Arial") +  
  theme(
    legend.position = "bottom",  
    legend.title = element_blank(),
    legend.key = element_rect(fill = "white", colour = "white")
  ) +
  labs(
    caption = "Figure 1. Mean wellbeing scores before and after different exercises. Error bars represent standard errors."
  )

print(bar_chart)

ggsave("vbar_chart.jpg") # this will save your bar chart plot into your working directory, as a .jpg file.

Exercise Four

Run a 2x2 factorial ANOVA.

#we can run the ANOVA using aov_ez()
anova_result <- aov_ez(id = "id",
                       within = c("time", "exercise"), 
                       dv = "wellbeing", 
                       type = 3,
                       data = mydata_long)

factorial_output <- summary(anova_result)
print(factorial_output)

library(effectsize)
eta_squared(anova_result, partial = TRUE) #ask for partial eta square

If the is a significant interaction, we need to break it down. A great start point is to plot it.
Either use your box plot/violin plot.
Or using a line graph is a helpful way to interpret an interaction.

#let's plot it first
interaction.plot(mydata_long$exercise,  # plot the x-axis
                 mydata_long$time,      # plot the different lines
                 mydata_long$wellbeing, # state the dependent variable
                 fun = mean)            # ask for the mean values

Now let’s check out what is driving this interaction.

# it looks pretty clear that wellbeing score is higher after compared to before. 
# but what sees the biggest increase, running or swimming?
# to do this we need to compare running vs. swimming, at the different time levels
# this is the simple effects analysis.
# we need `pairwise ~ WHAT WE WANT TO COMPARE | WHAT VARIABLE WE WANT TO LOOK ACROSS`...
# or `pairwise ~ exercise| time`,  see below:

#first we need to get the emmeans
#then we need to ask for cohen's d too
posthoc_factorial <- emmeans(anova_result, 
                             pairwise ~ exercise| time, 
                             adjust = "holm")
output <-  posthoc_factorial$contrasts %>%
  summary() %>% 
  mutate(cohen_d = effectsize::t_to_d(t.ratio, df))

view(output)

What about sphericity? Well, an important announcement:
SPHERICITY DOES NOT APPLY WHEN THERE ARE ONLY TWO LEVELS
This was a 2x2 design. had it been 3x2 or 3x3 then we’d need to consider Sphericity… …which you will do next week.


Evaluate the model…

# testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
qqPlot(residuals) #produces a qq-plot of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") #histogram of residuals
shapiro.test(residuals) #Shapiro test for residuals

Exercise Five

Take the time to write up your results.
Use APA format and remember to present descriptive statistics.
It is important to practice this because you’ll soon be doing this for your lab report!


END OF WORKSHOP


Workshop 8: Factorial ANOVA (Mixed)


Download the .csv file from Moodle and conduct the appropriate analysis, following the process below:

library(tidyverse)
library(afex)
library(emmeans)

#it looks like the data are not in long or tidy format. we need to fix that.
mydata_long <- mydata %>%
  pivot_longer(
    cols = word_list:problem_solving,
    names_to = "task_type",
    values_to = "score")

#we need to make sure any categorical variables are coded as a "factor" and any scale variables are coded as "numeric".
mydata_long$music = factor(mydata_long$music, levels = c("silence", "classical", "pop")) #turns music into a factor
mydata_long$task_type = factor(mydata_long$task_type, levels = c("word_list", "comprehension", "problem_solving")) #turns tasktype into a factor
mydata_long$score = as.numeric(mydata_long$score) #turns score into numeric

Replace NULL where necessary.
You have used this code previously.

descriptives <- NULL %>%
  group_by(NULL, NULL) %>%
  summarise(m = mean(NULL),
            sd = sd(NULL),
            n = n(),
            se = sd/sqrt(n))
ggplot(mydata_long, aes(x = NULL, y = NULL, fill = NULL)) +
  geom_boxplot() +
  theme_classic()
anova_result <- aov_ez(id = "NULL",
                       within = "NULL",
                       between = "NULL",
                       dv = "NULL", 
                       type = 3,
                       data = NULL)

#tidy the output
anova_result_output <- (anova_result$anova_table) %>%
  tidy()
view(anova_result_output)

#what if you want the sphericity information
factorial_output <- summary(anova_result)
print(factorial_output)
interaction.plot(mydata_long$NULL,      # plot the x-axis
                 mydata_long$NULL,      # plot the different lines
                 mydata_long$NULL,      # state the dependent variable
                 fun = mean)            # ask for the mean values

emmeans_task <- emmeans(anova_result, ~ NULL | NULL)
contrast_task <- contrast(emmeans_task, method = "pairwise", adjust = "holm")
summary(contrast_task)
#testing normality, looking at normal distribution of residuals from the model.
residuals <- residuals(anova_result) #pulls residuals from the model
qqPlot(residuals) #produces a qq-plot of residuals
hist(residuals, breaks = 20, main = "Histogram of Residuals", xlab = "Residuals", col = "lightblue") #histogram of residuals
shapiro.test(residuals) #Shapiro test for residuals

library(car)
leveneTest(score ~ music, mydata_long)

END OF WORKSHOP


Workshop 9: Data Visualisation


Exercise One

Load relevant packages and import this week’s data file.

library(tidyverse)

Exercise Two

Check the file and code any variables.

# have a quick check of your data file to learn about it.
#have a quick check of your data file to learn about it.
head(mydata)
names(mydata)
summary(mydata)

#we need to code variables and re-order them to a meaningful order.
mydata$time = factor(mydata$time, levels = c("morning", "afternoon")) #turns into a factor
mydata$drink = factor(mydata$drink, levels = c("coffee", "tea")) #turns into a factor
mydata$score = as.numeric(mydata$score) #turns into numeric

Exercise Three

A note on ggplot2:
ggplot2 is a package but comes withing the tidyverse package bundle.
ggplot2 is a very powerful data visualisation package.
This workshop will only give you an introduction to the basics…
…particularly the style of figures you might use for lab report one.
However, there is so much more you can do with ggplot2 and there are lots of resources online.

# step by step
# layer 1: specify data and aesthetics using aes()
ggplot(mydata, aes(x = NULL, y = NULL, fill = NULL))

+

# layer 2: present the values using stat_summary()
  stat_summary(fun = mean, geom = "bar", position = position_dodge(), color = "black")

+

# layer 3: add error bars
  stat_summary(fun.data = mean_se, geom = "errorbar", 
               position = position_dodge(0.9), width = 0.25) 

+

# layer 4: labels with labs()
labs(
    title = NULL,   # leave this as NULL because APA figures do not use titles.
    x = "NULL",  
    y = "NULL")

+

# layer 5: group/condition names and colours
scale_fill_manual(values = c("morning" = "slategray4", "afternoon" = "slategray2"), 
                    labels = c("Morning", "Afternoon")) +
scale_x_discrete(labels = c("coffee" = "Coffee", "tea" = "Tea"))

+

# layer 6: add the theme to tidy up
theme_minimal(base_size = 14) +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(color = "black", fill = NA, size = 1),
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.title = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"))

You can also run the code below to produce different types of figures.
You might choose to use this code when you come to write your lab report.

# violin plot (with box plot) complete code example
# you might use and tweak this code for your lab report.
ggplot(mydata, aes(x = drink, y = score, fill = time)) +
  geom_violin(trim = TRUE, scale = "area", alpha = 0.2, color = "white", size = 0.1) +
  geom_boxplot(width = 0.2, position = position_dodge(0.9), color = "black", alpha = 0.8, outlier.shape = 1) +
  labs(
    title = NULL,
    x = "Drink",
    y = "Productivity Score",
    fill = "Time of Day") +  # legend title
  scale_fill_manual(
    values = c("morning" = "#1f77b4", "afternoon" = "#ff7f0e"),
    labels = c("Morning", "Afternoon")) +  # legend labels
  scale_x_discrete(
    labels = c("coffee" = "Coffee", "tea" = "Tea")) +  # x-axis labels
  theme_classic(base_size = 14) +
  theme(
    panel.grid.major = element_blank(),  # Remove major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    panel.border = element_rect(color = "black", fill = NA, size = 1),
    plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
    legend.title = element_text(face = "bold"),
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"),
    axis.text.y = element_text(size = 12, face = "bold"))  # Added bold to y-axis text
#  box plot code example
# you might use and tweak this code for your lab report.
ggplot(mydata, aes(x = drink, y = score, fill = time)) +
  geom_boxplot(
    width = 0.5,
    alpha = 0.7,
    color = "black",
    size = 0.2) +
  # Customizing labels and themes
  labs(
    title = NULL,
    x = "Drink Type",
    y = "Productivity Score",
    fill = "Time") +  # legend title)
  scale_fill_manual(
    values = c("morning" = "#1f77b4", "afternoon" = "#ff7f0e"),
    labels = c("Morning", "Afternoon")) +
  scale_x_discrete(labels = c("coffee" = "Coffee", "tea" = "Tea")) +  # x-axis labels
    # Ensure the y-axis starts at zero
  coord_cartesian(ylim = c(0, NA)) +
  theme_minimal(base_size = 16) +
  theme(
    panel.grid.major = element_blank(),  # Remove major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    panel.border = element_rect(color = "gray80", fill = NA, size = 0.5),
    plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
    legend.title = element_text(face = "bold", size = 14),  
    legend.position = "top",
    axis.title.x = element_text(face = "bold", size = 14),
    axis.title.y = element_text(face = "bold", size = 14),
    axis.text.x = element_text(size = 12, face = "bold"), 
    axis.text.y = element_text(size = 12))

Below are the figures you should have been able to produce:

Click Here


Part Two of this Workshop does not use R.


END OF WORKSHOP


# 

Workshop 10: Non-Parametric Analyses